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HETEROLOGOUS GENE EXPRESSION IN PLANTS 



The present invention relates to heterologous gene expression in plants. More 
specifically, the invention relates to high expression of heterologous proteins in seeds, 
5 by incorporating the gene between seed specific sequences. Preferably, said 
heterologous protein is a single-chain antibody variable fragment (scFv). 
The ability to clone and produce a wide range of proteins from diverse sources 
became feasible with the advent of recombinant technology. The selection of 
expression hosts for commercial production of heterologous proteins is based on the 

10 economics of the production technique, such as the fermentation cost, on the cost of 
the purification and on the ability of the host to accomplish the post-translational 
modifications needed for full biological activity of the recombinant protein. 
Although in many cases, prokaryotic cells such as Escherichia coli or simple eukaryotic 
cells such as the yeast Saccharomyces cerevisiae are the host cells of choice, these 

15 systems are not sufficient in all cases and problems can be encountered both in yield 
and activity of the protein produced. Alternative systems such as plant cells, 
mammalian cells and insect cells may solve the problem of biological activity, but 
suffer from a high fermentation cost and a low yield. 

Transgenic plants can produce several types of heterologous polypeptides, comprising 
20 antibodies (ab's) and antibody fragments (Whitelam et al„ 1993; Goddijn and Pen, 
1995; Hemming, 1995). Antibodies and antibody fragments are interesting from an 
industrial point of view: they can be produced against nearly every type of organic 
molecule and are binding this antigen in a very specific way. However, the major 
drawback is the production cost. Plants and plant cells are an interesting alternative for 
25 the production of these molecules, and other polypeptides that are difficult to produce 
in other prokaryotic or eukaryotic cells. 

US5804694 describes the commercial production of p-glucuronidase in plants, by 
placing the p-glucuronidase gene after an ubiquitin promoter. With this construction, 
0.1% of the total extracted protein is p-glucuronidase. By targeting the heterologous 
30 protein to the endoplasmic reticulum, the accumulation can be improved, especially for 
ab's and ab fragments. Using the cauliflower mosaic virus 35S promoter resulted in 
expression level of single chain Fv (scFv) antibodies up to 4 - 6.8% of total soluble 
protein in leaves (Fiedler ef ai , 1 997). 



1 



WO 02/00899 PCT/EP01/06298 

A lot of the efforts for production of proteins in plants have been focused on seeds. 
Indeed, seeds, especially those of legumes and cereals, contain large quantities of 
protein; these are mainly storage proteins, which can form up to 7-15% of the dry 
weight for cereals, and up to 20-40% for legumes. Moreover, those storage proteins 
5 are limited in number, and some of them can be responsible for up to 20% of the total 
protein content in seed (Vitale & Bollini, 1995), which may have important advantages 
for the purification process. In that respect, the promoters coding for the storage 
proteins have been considered as ideal tools to obtain high expression levels of 
proteins in seed (Fiedler et a/., 1997) 

10 US5504200 discloses the use of the phaseolin promoter for the expression of 
heterologous genes in plants and plant cells. W091 13993 describes the expression of 
animal genes, or the gene from brazil nut 2S storage protein, using a promoter 
selected from the group consisting of the phaseolin promoter, the a'-subunit of p- 
conglycinin promoter and the p-zein promoter. The gene is linked to a po!y-A signal 

15 selected from the group consisting of phaseolin poIy-A signal and animal poly-A signal. 
However, none of these systems leads to high heterologous protein expression. 
WO9729200 describes a seed specific expression level of 1.9% of heterologous 
protein on total soluble protein, using the specific legumin B4 promoter. Further 
improvement of the expression cassettes lead to an expression of 3-4% of scFv 

20 antibodies in ripe tobacco seeds (Fiedler et a/., 1997) 

Recently, the arcelin 51 gene (arcSI) of Phaseolus vulgaris was isolated and cloned. 
This gene is responsible for the production of the seed storage protein Arcelin 5a 
(ARC5a) that accumulates in wild type plants up to 24-32% of the total protein content 
of the seed (Goossens et ai, 1994; Goossens et a/., 1995). Expression of the gene in 

25 Arabidopsis thaliana and Phaseolus acutifolius indicated that the seed storage protein 
ARC5a could be expressed up to 15%, respectively 25% of the total soluble protein 
(Goossens et ai, 1999), when the own promoter and expression signals were used. 
However, no evidence was shown that the promoter could give efficient expression 
with other proteins. Surprisingly, we found that the use of the arcelin 51 promoter or the 

30 phaseolin promoter, in combination with the arcelin 51 leader sequence or the Tobacco 
Mosaic Virus (TMV) omega leader and the arcelin 51 3' sequence could result in an 
expression level of heterologous protein as high as 12% of the total seed protein, 
which is far higher than known in the prior art. 
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It is a first aspect of the invention to provide a seed preferred expression cassette 
having gene regulatory elements comprising 

a) the arcelin promoter comprising the sequence shown in SEQ ID 1 or a 

phaseolin promoter comprising SEQ ID N e 5. 
5 b) the arcelin 51 leader shown in SEQ ID 2 or a TMV omega leader 

c) the arcelin 51 3'end comprising the sequence shown in SEQ ID 3. 

Preferably, said seed preferred expression cassette comprises said arcelin promoter, 
said arcelin 51 leader and said arcelin 51 3' end. 

Said seed preferred expression cassette may also comprise the sequence shown in 

10 SEQ ID N° 4, encoding the 2S2 storage albumin signal peptide of Arabidopsis thaliana 
(Krebbers ef a/., 1988). In order to obtain seed specific expression of a gene of 
interest, said gene is placed between said leader sequence and said arcelin 3'end 
sequence. Preferably, said gene of interest is fused to the sequence encoding the 2S2 
storage albumin signal peptide. In one preferred embodiment, the gene of interest is a 

1 5 gene encoding a scFv antibody. 

It is another aspect of the invention to provide a seed preferred expression cassette, 
which is not prone to silencing. When using the expression cassette according to the 
invention to express a gene of interest, more than 40% of the transformed lines are not 
silenced and do show a high expression, preferably more than 50% of the lines are not 

20 silenced, even more preferably more than 75% are not silenced. 

It is another aspect of the invention to provide a method to obtain seed preferred 
expression of a heterologous protein at a level of at least 10%, preferably a level of at 
least 15%, 20%, 25%, 30%, 35% or 40 % of the total soluble seed protein, with the 
proviso that said heterologous protein is not an unmodified seed storage protein such 

25 as Arcelin 5a. Preferably, said heterologous protein is not an unmodified Arcelin, 
Phaseolin or Zein. In a preferred embodiment, a seed preferred expression cassette 
according to the invention is used. Another preferred embodiment is a method 
according to the invention, whereby said heterologous protein Is a scFv. 
Still another aspect of the invention is a plant cell, transformed with an expression 

30 cassette according to the invention, or a transgenic plant comprising an expression 
cassette according to the invention. Indeed, the expression cassette can be 
incorporated and transformed to a plant cell or plant, using methods known to the 
person skilled in the art. Said methods include, but are not limited to Agrobacterium T- 
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DNA mediated transformation, particle bombardment, electroporation and direct DNA 
uptake. 



Definitions 

5 The following definitions are set forth to illustrate and define the meaning and scope of 
various terms used to describe the invention herein. 

Seed preferred expression means that the expression preferably takes place in the 

seed, but does not exclude expression in other organs of the plant. 

Gene of interest as used here means the coding sequence of a gene of which one 

1 0 wants to obtain seed preferred expression. 

Leader as used here means the 5' end untranslated sequence. 
Signal peptide indicates the initial function of the peptide in the 2S2 albumin storage 
protein but does not necessarily imply that the peptide has the same function and is 
processed in the same way when it is fused to the gene of interest 

15 Heterologous protein refers to any protein that can be expressed in seed but which is 
not an unmodified seed storage protein. In contrast, modified seed storage proteins 
(specific mutants, fusion proteins, improved seed storage proteins...) are part of the 
present invention and are thus included in said definition. 

20 Brief description of the figures 

Figure 1: Overview of the T-DNA vectors for the evaluation of scFv production in 

seeds of Arabidopsis thaliana. 

LB and RB = left- en right border of the T-DNA 

pVS1 ■ plasmid insertion of Pseudomonas aeruginosa for vector stability and 
25 replication in Agrobacterium tumefaciens. 
pBR = ori of replication in Escherichia coil 

Nptil = selection marker neomycine phosphotransferase II under control of the nos- 
promoter and the ocs 3-termination and poly-adenylation-signals. 
Sm/SpR = bacterial resistance gene for spectinomycin and streptomycin 
30 Figure 2: Construction of pBluescript (2S2-G4) 
Figure 3: Construction of patag5 (3'-arc5l) 
Figure 4: Construction of pSP72 (ParcSI/ARCSa 08 ) 
Figure 5: Amplification of DNA fragments for vector construction 
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Figure 6: Construction of pParc5l-G4. This vector has been, in accordance with the 
Budapest Treaty, deposited with the Belgian Coordinated Collections of 
Microorganisms-BCCM™ Laboratorium voor Moleculaire Biologie-Plasmidencollectie 
(LMBP), Universiteit Gent, K.L. Ledeganckstraat 35, B-9000 Gent, Belgium by Dr. Ann 
5 Depicker, Molenstraat 61, 9820 Merelbeke, Belgium (work: K.L. Ledeganckstraat 35, 
9000 Gent, Belgium) and has accession number: LMBP 4128. 
Figure 7: Construction of pParc5l-Q-G4 
Figure 8: Construction of pP35S-G4 
Figure 9: Construction of pPp-phas-G4 

10 Figure 10: Results of scFv-G4 quantification in seed extracts by quantitative Western 
blot 500 ng protein of A-, P-, and fl-seed extracts and 2,5 ug protein of 35S- and Col 
O-extracts were loaded on a 10 % SDS-PAGE gel. G4 proteins were detected by 
monoclonal anti-c-myc antibody and anti-mouse antibody coupled to alkaline 
phosphatase. ColO = negative control. 

15 Figure 11: scFv-G4 quantification in seed extracts from flora! dip transformants by 
quantitative Western blot. Results are shown from 10 segregating T2 seed stocks 
transformed with pParc5l-G4 (upper blot), and 4 segregating T2 seed stocks 
transformed with pPp-phas-G4 (lower blot). We loaded 1 microgram seed protein for 
A1, A7, F28, F31, F38, and F39, 1.5 microgram for A3, A5, A8, A15, A22, and A42; 

20 and 2 microgram for A14 and A16 on a 10% SDS-PAGE gel. G4 proteins were 
detected by monoclonal anti-c-/nyc antibody and anti-mouse antibody coupled to 
alkaline phosphatase. M = molecular weight marker. 

Figure 12: Coomassie blue stained SDS/page gel showing separated Arabidopsis 
seed proteins from transgenics F28 (lanes 3 and 7), F31 (lanes 4 and 8), F38 (lanes 5 

25 and 9), F39 (lanes 6 and 10), transformed with pPp-phas-G4 and from an 
untransformed control plant (lane 2). Lanes 3, 4, 5, and 6 contain 30 microgram protein 
and lanes 7, 8, 9, and 10 contain 20 microgram protein; The arrow indicates the 
recombinant scFv protein band. Lane 1 contains the molecular weight marker. On 
basis of the coomassie stained protein bands in separate lanes, Image master VDS 

30 software measured following G4 contents for each line: 9.9% for F28 (11.3% by 
Western blot), 15.4% for F31 (20.0% by Western blot), 16.0% for F38 (19.0% by 
Western blot), and 9.6% for F39 (12.0 % by Western blot). 

Figuur 13: Schematic representation of the ELISA-test used to analyse the antigen 
binding activity of ex p/anfa-extracted and E. co//-extracted scFv-proteins. (1) Coating 
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of microliter well with monoclonal antibody 9E10 binding to the cmyotag of the scFv; 
(2) co-incubation of several amounts of scFv from seed or E co// with an excess of the 
antigen dihydroflavonole-4-reductase (DFR) from Petunia hybrida; detection of bound 
DFR with (3) polyclonal antiserum against DFR from rabbit and (4) polyclonal anti- 
5 rabbit serum coupled to alkaline phosphatase (AP). 

Figuur 14: Results of the analysis of antigen-binding activity of seed- and Eco/f- 
extracted scFv proteins by ELISA. The ELISA test (figure 12) was performed with 
different G4-concentrations (X-axis) in presence or absence (controls) of DFR-antigen. 
Y-axis represents the ELISA-signal (AFU/min). 
10 Figuur 15: Construction of patag6 (3'-arc5l). 
Figure 16: Construction of pParc5l-G4bis. 

Figure 17: scFv-G4 quantification in different seeds (3/2, 3/3, 3/4, and 3/5) from a 
single transgenic Phaseolus acutifolius plant. From each seed extract, we loaded 3 
(lanes a) or 4 (lanes b) microgram seed protein on a 10% SDS-PAGE gel. G4 proteins 
15 were detected by monoclonal anti-c-myc antibody and anti-mouse antibody coupled to 
alkaline phosphatase. M = molecular weight marker. 

Examples 

Example 1: Cloning of the T-DNA vectors for the evaluation of scFv production 
20 in Arabidopsis thaliana under control of the arc5I expression signals of 
Phaseolus vulgaris,. 

Four T-DNA vectors were constructed to evaluate and compare scFv production under 
control of the aroSI expression signals (vectors pParc5l-G4 and pParc5l-Q-G4), the 
35S promoter of the cauliflower mosaic virus (vector pP35S-G4), and the promoter of 
25 the p-phaseolin gene of Phaseolus vulgaris (vector pPp-phas-G4) (figure 1). A first 
step in the cloning procedure consisted of the construction of three pilot vectors: 
pBluescript (2S2-G4) (figure 2), patag5 (3'-arc5l) (figure 3) en pSP72 (Parc5l/ARC5a ra ) 
(figure 4). 

Construction of pBluescript f2S2-G4) (figure 2): 
30 The coding sequence of the scFv fragment G4, fused at its 3'-end to the coding 
sequence of the c-myc tag (Evan et a/., 1985) and the ER-retention signal KDEL 
(Denecke etal., 1992), was cut from the T-DNA vector pG4ER (De Jaeger et al., 1999) 
by the restriction sites Nco\ and Xbal Besides, an oligonucleotide was made that 
encodes the signal sequence of the seed storage protein 2S2 from Arabidopsis 
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thaliana (Krebbers ef a/., 1988). This oligonucleotide was flanked at its 3'-end by the 
'sticky* end of the restriction site Ncol and was flanked at its 5'-end by a 'stick/ end 
that complements the restriction site Xba\ but destroys it after ligation, followed by the 
restriction sites EcoRI, Bg/ll en Hind\\\. The G4 fragment and the oligonucleotide were 

5 cloned together in the vector pBluescriptKS (Stratagene, La Jolla, CA), after cutting 
with Hind\\\ en Xbal. The sequence of the insert was checked and a clone containing a 
correct insert was selected for further cloning steps. As such, we obtained the pilot 
vector pBluescript (2S2-G4), which contained the G4-encoding sequence, preceded by 
a series of unique restriction sites that could be used for the insertion of the different 

1 0 promoters in combination with a mRNA-ieader sequence. 
Construction of natao5 ( 3'-arc5h ffiaure 3): 

An oligonucleotide was made to insert a few unique restriction sites in the T-DNA- 
vector patag4 (Goossens et a/., 1999). The oligonucleotide contained the following 
restriction sites from its 5*-end to its 3'-end: the S'-'sticky* end of EcoRI, the restriction 

15 sites Xbal, Xbol, and Bg/ll and a 3'-'sticky' end that complements Xbal but destroys it 
after ligation. The oligonucleotide was ligated in patag4, after cutting with EcoRI en 
Xbal. This resulted in the vector patag5. The sequence of the inserted oligonucleotide 
was checked and a clone with correct insert was selected for further cloning steps. 
From the vector pBluescript {arc5f) (Goossens et a/., 1995), which contains the 

20 genomic sequence of the arc5l-gene, we cut the 3'-expression signals of arc51 (3'- 
arc5l) by using Xbal en EcoRI. The 3'-arc5l-fragment was ligated in patag5, after 
cutting with Xbal en EcoRI. This resulted in the T-DNA-vector patag5 (3'-arc5l). 
nnnstnirtion of dSP72 (P amKI/ARCSa^ (figure 4): 

The arc5/ promoter and the coding sequence of the ARC5a-protein (ARCSa 08 ) were 
25 cut from pBluescript (arcSI). This fragment was ligated in the cloning vector pSP72 
(Promega, Madison, Wl) after cutting with EcoRI en Xbal. This resulted in the vector 
pSP72 (ParcSI/ARCSa 158 ), containing the arc5/-promoter preceded by the restriction 
sites EcoRI and Bg/ll. 

Besides these three pilot-vectors, four DNA fragments were amplified by PCR. These 
30 fragments were called PCR1, PCR2, PCR3 en PCR4 (figure 5). Fragments PCR1 en 
PCR2 were amplified from the 3'-end of the arc5/-promoter (Parc5l) in pBluescript 
(an?5/). PCR1 contained the 3'-end of ParcSI followed by the an^'leader* and the 5'- 
end of the 2S2 signal sequence. PCR2 contained the same 3'-end of ParcSI, but 
followed by the Q-'leader* and the 5*-end of the 2S2 signal sequence. Both fragments 
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were flanked by the restriction sites Sac\ en HindWl PCR3 was obtained by amplifying 
the 35S promoter of the cauliflower mosaic virus from the vector pGEJAEI (De Jaeger 
et a/., 1999). At the 3'-end of the 35S promoter, we built in the arc5/-'leader\ followed 
by the 5'-end of the 2S2 signal sequence. PCR3 was flanked by the restriction sites 
5 Bgl\\ and HindWl PCR4 contained the promoter of ihefi-phaseolin gene of Phaseolus 
vulgaris, amplified from the vector pBluescript (Pp-phas) (van der Geest ef a/., 1994). 
At the 3'-end of PCR4, we built in the arc5/-'leader' followed by the 5'-end of the 2S2- 
signal sequence. This fragment was flanked by the restriction sites Xho\ and HindWl 
The pilot vectors and the four PCR fragments were used to clone the four T-DNA 
1 0 vectors from figure 1 . 

Construction of pParc5t-G4 (figure 6): 

First, the 5'-end of the arc5l promoter was cut from the vector pSP72 (ParcSI/ARCSa*) 
by using Bg/ll and Sacl enzymes. This promoter fragment, together with the PCR1 
fragment, was ligated in de vector pBluescript (2S2-G4), after restriction digest with 

15 BglU en Hindlil This resulted in the vector pBluescript (Parc5l-arc5I'leader'-2S2-G4). 
The sequence of the inserted PCR1 fragment was checked and a clone with correct 
insert was selected for further cloning steps. Finally, the a/c5/-promoter with the arc5l- 
leader* and the G4-coding sequence was cut from the former construct by the 
restriction sites Bg/ll en Xoal and ligated in the vector patag5(3'-arc5l), after digestion 

20 with Bg/ll en Xbal This resulted in the T-DNA vector pParc5l-G4. This plasmid, 
transformed in E. coli MC1061 is deposited at BCCM under deposit number LMBP 
4128. The plasmid contains the full a/c5/-promoter and the full 3'end of arcSI, as used 
in the expression cassettes comprising the arc5/-promoter and the 3'end of arcSI. 
Construction of pParc5l-Q-G4 (figure 7): 

25 First, the 5'-end of the arcSI promoter was cut from the vector pSP72 (ParcSI/ARCSa 08 ) 
by restriction digest with Bg/ll and Sacl. This promoter fragment, together with the 
PCR2 fragment, was ligated in de vector pBluescript (2S2-G4), after restriction digest 
with Bg/ll en HindlW. This resulted in the vector pBluescript (Parc5l-Q1eader , -2S2-G4). 
The sequence of the inserted PCR2 fragment was checked and a clone with correct 

30 insert was selected for further cloning steps. Finally, the a/e5/-promoter with the n- 
'leader* and the G4-coding sequence was cut from the former construct by the 
restriction sites Bg/ll en Xbal and ligated in the vector patag5 (3'-arc5l), after digestion 
with Bg/ll en Xbal This resulted in the T-DNA vector pParc5l-Q-G4. 
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Construction of PP35S-G4 (figure 8): 

The PCR3 fragment was ligated in the vector pBIuescript (2S2-G4), after restriction 
digest with BglU and H/ndlll. This resulted in the vector pBIuescript (P35S- 
arc5l , leader , -2S2-G4). The sequence of the inserted PCR3 fragment was checked and 
5 a clone with correct insert was selected for further cloning steps. Finally, the 35S- 
promoter with the arcSI-'Ieader* and the G4-coding sequence was cut from the former 
construct by the restriction sites BgH\ en Xoal and ligated in the vector patagS 
(3'-arc5l), after digestion with Bg!\\ en Xoal. This resulted in the T-DNA vector pP35S- 
G4. 

10 Construction of pPB-phas-G4 (figure 9): 

The PCR4 fragment was ligated in the vector pBIuescript (2S2-G4), after restriction 
digest with Xho\ en HindWl This resulted in the vector pBIuescript (Ppphas- 
arc5I'leader , -2S2-G4). After checking the DNA-sequence of the inserted PCR4- 
fragment, we found in the p-phaseolin promoter a few basepairs that differed from the 

15 original sequence (Bustos et at, 1991;Genbank accession number J01263). However, 
the 3'-end of the cloned promoter sequence, starting from the Afcfel-site, was 
completely the same as the given sequence. Therefore, this piece, together with the 
coding sequence of scFv G4, was cut from the vector pBIuescript (Ppphas- 
arc5Pleader ? -2S2-G4) by using the restriction sites /Vote! en Xoal. Besides, the 5*-end 

20 of the ^-pnaseo/Zne-promoter was cut from pBIuescript (Pp-phas) by the restriction 
sites Xho\ en A/del. Both DNA-fragments were ligated in patag5 (3'-arc5l), after 
restriction digest with Xho\ en Xoal. This resulted in the final vector pPp-phas-G4. 
Again we checked the sequence of the frphaseofine-promoter and the same 
differences were found with the original sequence. The sequence between the Xho\ 

25 site and the Mfel site, as used in the construct is depicted in SEQ ID N° 5. 

The four T-DNA vectors were purified from Escherichia cofi and electroporated in 
Agrobacterium tumefaciens CSSCIRif* (pMP90). After colony purification, plasmids 
were purified from Agrobacterium and checked. The Agrooacfem/m-strains were used 
in Arabidopsis transformation. 

30 
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Example 2: Transformation of Arabidopsis thallana and regeneration of 
transgenic plants 

Arabidopsis thaliana (Columbia genotype O) was transformed by root-transformation 
(Valvekens et al., 1988) with the constructs pParc5l-G4, pParc5l-0-G4, pP35S-G4 en 
5 pPp-phas-G4. After selection of transformed calli on kanamycin-selective medium, 150 
calli were transferred to shoot inducing medium. Finally, shoots were transferred to 
root inducing medium. After root formation plants were transfered to the greenhouse 
and seeds were collected from the following numbers of transgenic Arabidopsis plants: 
36 for pParc5l-G4, 4 for pP35S-G4, 1 8 for pP j3-phas-G4, and 1 3 for pParc5l-Q-G4. 
10 In parallel, the same constructs were used for Arabidopsis transformation by floral dip' 
(Clough & Bent, 1998). Transformed T1-piants were selected on kanamycin-containing 
selective medium, transferred to the greenhouse, and seeds were collected. 

Example 3: scFv accumulation in transgenic Arabidopsis seeds 

15 Seed extraction and protein quantification 

Crude seed protein extracts were obtained following a modification of the extraction 
protocol of van der Klei et al. (1995) (Goossens et al, 1999). Ground seeds were 
extracted twice with hexane to remove lipids. The residue was lyophilized and 
subsequently extracted twice with 50 mM Tris/HCI, 200 mM NaCI, 5 mM EDTA, 0,1% 

20 Tween 20, pH 8 (Fiedler et al. , 1 997) for 1 5 min at room temperature under continuous 
shaking. To prevent protein degradation, a protease inhibitor mix (2x C0mplete™, 
Roche Molecular Biochemicals, Germany) was added to the extraction buffer. The 
pellet was removed by centrifugation at 20000xfir and supematants were pooled. Total 
protein quantity in the crude extracts was determined by the Lowry method using the 

25 DC Protein Assay (BioRad, Hercules, USA) with BSA as a standard (Table 1). The 
reliability of Arabidopsis seed protein quantification by this method was proven by 
Goossens etal., 1999. 

30 
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Table 1: Total protein concentration in transgenic seed extracts (500 
microliter) from 10 mg transgenic Arabidopsis seeds 
TA A105 A140 A 143 A165A 
3,665 ug/ul 3,395 ug/>il 3,042 ug/yl 3,708^ 3,675 ug/ui 

~P3A P5 PIS P22A P102B 

2,852 jig/jil 4,028 ng/jd 3,623 |ig/ul 3,151 ug/ul 3,339 



Q7A Q33 n65C Q 98A n 130 

3,873 ng/uJ 3,527 ug/uJ 3,184 ug/uJ 3,754 jiig/fxi 3,517 ug/ui 



35S 93 


35S 101 


35S 116 35S 131 A 


ColO 


3,945 ug/ul 


3,837 ug/ul 


3,879 ug/ul 3,906 ug/ul 


3,215 ug/ul 



Col 0 = non-transformed Columbia genotype O used as a control 
A = pParc5l-G4 transformed Arabidopsis line 
P = pPp-phas-G4 transformed Arabidopsis line 
35S = pP35S-G4 transformed Arabidopsis line 
£2 = pParc5l-Q-G4 transformed Arabidopsis line 



3.2. Quantification of scFv-G4 accumulation 

Proteins were separated on SDS/PAGE and accumulation levels of scFv-G4 proteins 
10 were determined by quantitative Western blot analysis using the anti-c-myc 
monoclonal antibody 9E10 (Evan et a/., 1995) followed by anti-mouse IgG coupled to 
alkaline phosphatase (Sigma, St Louis, MO, USA), according to De Jaeger et a/., 1999 
(figure 10). Different amounts of scFv-G4 proteins purified .from Escherichia coli (De 
Jaeger et ah, 1999) were used as standards. The constructs pParc5l-G4, pParc5l-£2- 
15 G4, and pPp-phas-G4 give very high scFv-G4 accumulation levels in Arabidospsis 
seeds, in the range of 10% of total soluble seed protein. These are the highest levels 
ever reported for scFv proteins produced in plants. Moreover, lines with such high 
levels were easily found, as only 5 lines were screened for each construct, which is an 
indication that the expression cassettes are not very sensitive to silencing. 

20 
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Table 2: ScFv-G4 accumulation levels in transgenic Arabidopsis seeds. 


PParc5l-G4 


pPp-phas-G4 


pParc5l-Q-G4 


pP35S-G4 


Line 


G4-level 


Line 


G4-leveI 


Line 


G4-level 


Line 


o4-ievei 




0 




0 




(*) 




\ ) 


A1 


<4% 


P3A 


10% 


CI 7 A 


12% 


35S 93 


< U,O70 


A a nc 
A 1UO 


1U70 


r 0 


1 u /o 




<4% 


35S 


3% 














101 




A 140 


6% 


P15 


<4% 


nesc 


6% 


35S 


< 0,8% 














116 




A 143 


8% 


P22A 


12% 


f2 98A 


<4% 


35S 


< 0,8% 














131A 




A165A 


8% 


P102B 


12% 


ni30 


<4% 






(*) accumulation level 


as % of total soluble protein content in transgenic set 


3ds 



Example 4: Transformation of Arabidopsis thaliana by 'floral dip 1 and 
regeneration of transgenic plants 

Arabidopsis thaliana (Columbia genotype O) was transformed by 'floral dip' (Clough & 
5 Bent, 1998) with the constructs pParc5!-G4, pParc5l-Q-G4, pP35S-G4, and pPp-phas- 

G4. Transformed T1 -plants were selected on kanamycin-containing selective medium, 

transferred to the greenhouse, and T2-segregating seed stocks were collected. 

ScFv accumulation in transgenic T2-seq re g ating seed stocks 

Seed extraction and pr otein quantification 
10 Seed extraction and protein quantification was determined as described for Example 3. 

Quantification of scFv-G4 accumulation 

Proteins were separated on SDS/PAGE and accumulation levels of scFv-G4 proteins 
were determined by quantitative Western blot analysis using the anti-c-myc 
monoclonal antibody 9E10 (Evan et aL, 1995) followed by anti-mouse IgG coupled to 

15 alkaline phosphatase (Sigma, St Louis, MO, USA), according to De Jaeger et aL, 1999 
(figure 11). Different amounts of scFv-G4KDEL proteins purified from Escherichia coii 
were used as standards. Most seed stocks were at least two times analysed. The 
constructs pParc5l-G4 and pPp-phas-G4 give very high scFv-G4 accumulation levels 
in Arabidopsis seeds, in the range of 5-10% and 10%-20% of total soluble seed 

20 protein, respectively (tabel 3). Use of the a/c5/-untrans!ated leader in pParc5l-G4 or 
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the TMV(omega)-leader in pParc51-Q-G4 give similar accumulation levels (tabel 3), 
showing that both leaders allow efficient translation start in seeds. Besides, inter- 
transgenic variation is low for all four constructs, which is an indication that the 
expression cassettes are not very sensitive to silencing. Seed extracts of four pPp- 
5 phas-G4 plant lines with highest G4-accumulation were further analysed by 
SDS/PAGE and Coomassie-blue staining (figure 12). A clear scFv-protein band could 
be identified at the expected size, which was absent in the untransformed control line. 
By using the Imagemaster VDS software (Pharmacia, Uppsala, Sweden), the scFv- 
percentage of total soluble seed protein was measured in each lane. Similar scFv- 
1 0 accumulation levels were obtained for those lines as with the quantitative Western blot 



analysis. 



Tabel 3: ScFv-G4 accumulation levels in transgenic Arabidopsis segregating T2- 


seed stocks. 20 independent seed stocks were analysed, scFv levels were ranked 


from highest to lowest. 






P Parc5l-G4 (*) 


pP0-phas-G4 (*) 


D Parc5l-n-G4 (*) 


PP35S-G4 (*) 


8.0 ±1.4 


20.0 ±0.0 


5.8 ±0.4 


1.1 ±0.1 


7.8 ±0.4 


19.0 ±1.4 


4.9 ±0.2 


1.1 ±0.1 


6.7 ±0.1 


12.0 ±0.0 


4.5 ±0.7 


1.1 ±0.1 


6.4 ±0.5 


11.3 ±1.8 


4.0 ±0.0 


1.0 ±0.0 


5.3 ±1.8 


7.2 ±0.2 


3.9 ±0.2 


1.0 ±0.0 


4.7 ±0.5 


5.4 ±0.9 


3.9 ±0.2 


0.8 ±0.1 


4.5 ±0.7 


5.0 ±0.0 


3.5 ±0.2 


0.7 ±0.1 


3.9 ±0.2 


4.5 ±0.7 


3.2 ±0.2 


0.7 ±0.1 


3.8 ± 0.4 


4.4 ±0.5 


3.1 ± 0.6 


0.7 ±0.1 


3.7 ±0.5 


4.0 ±1.0 


3.0 ±0.0 


0.7 ±0.0 


2.4 ±0.5 


3.9 ±0.2 


3.0 ±0.0 


0.7 ±0.1 


1.7 ±0.5 


3.8 ±0.3 


2.7 ±0.4 


0.7 ±0.1 


1.3 ±0.4 


1.8 ±0.4 


1.6 ±0.4 


0.6 ±0.0 


0.9 ±0.2 


1.8 ±0.3 


1.5 


0.5 ±0.1 


0.8 ±0.0 


1.6 


1.4 ±0.1 


0.5 ±0.0 


0.5 ±0.1 


1.3 


0.7 ±0.0 


0.4 ±0.0 


0.5 ±0.1 


1.2 


0.6 ±0.1 


0.3 ±0.0 


0.4 ±0.1 


0.8 


0.3 


0.2 ±0.1 


<0.3 


<0.4 


0.2 


0.1 


<0.3 


n.d. 


0.1 


n.d. 


(*) scFv accumulation level as % of to 


tal soluble protein content in transgenic 


seeds, with standard deviation, n.d. = not detectable 
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ScFv accumulation In transgenic T3-homozygous seed stocks 
For each construct, 10 T2-seed stocks containing highest G4-Ievels were genetically 
screened by segregation analysis. 72 seeds from each seed stock were germinated on 
kanamycin-containing selective medium and by statistical analysis we identified plant 
5 lines containing a single T-DNA locus. For the constructs pParc5l-G4 and pPp-phas- 
G4, 4 lines containing a single T-DNA locus were chosen to grow and select 
homozygous seed stocks. 10 T2-plants per line were grown, T3-seeds were collected 
and homozygous T3-seed stocks were selected by growing plants on kanamycin- 
containing selective medium. For one of the four lines containing construct pParc5l- 

10 G4, no homozygous seed stocks were found. G4-accumulation was measured by 
quantitative Western blot in T3-segregating and T3-homozygous seed stocks. 
Most pParc5l-G4 and all pPp-phas-G4 homozygous seed stocks gave higher G4- 
levels than the corresponding T3-segregating stocks (tabel 4). For both constructs 
pParc5l-G4 and pPp-phas-G4, homozygous seed stocks were obtained containing 

15 more than 10% G4 of total soluble seed protein. Homozygous seed stocks, 
transformed with pPp-phas-G4, contained extraordinary high G4 levels, up to 36.5% of 
TSP in seeds. This is the highest heterologous protein level ever reported for plants. 



Tabel 4: ScFv-G4 accumulation levels in transgenic Arabidopsis segregating 
and homozygous T3-seed stocks. 


Plant line 


T2-segregating 
seed stocks (*) 


T3-segregating 
seed stocks (*) 


T3-homozygous 
seed stocks (*) 


A1 


8.0 ±1.4 


8.2 ±1.0 
6.7 ±0.7 


4.4 ±0.8 
12.5 ±1.9 


A15 


4.7 ±0.5 


5.6 ±1.0 

4.7 ±0.0 


6.4 ±1.4 
7.7 ±1.4 


A16 


4.5 ±0.7 


3.8 ±0.7 
5.1 ±1.4 


6.0 ±1.0 
3.5 ±0.4 


F3 


4.5 ±0.7 


5.8 


18.0 


F24 


5.0 ±0.0 


7.0 ±1.4 
4.9 ±0.2 


13.5 ±2.1 
15.0 ±1.4 


F28 


11.3± 1.8 


13.0± 1.4 
10.5 ±0.7 


21.0 ±4.2 
18.5 ±0.7 
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10 



F38 



19.0±1.4 



17.5 ±0.7 
17.5 ±3.5 



36.5 ±3.4 



(*) scFv accumulation level as % of total soluble protein content in transgenic 
seeds, with standard deviation, n.d. = not detectable. For most lines more than 
one T3-segregating and homozygous seed stocks were analysed. A1, A15, 
and A16 are plant lines transformed with construct pParc5I-G4. Lines F3, F24, 
F28, and F38 were transformed with pPp-phas-G4. 



Analysis of scFv-qualitv in seed extracts 

Antigen-binding activity of seed extracted G4-proteins was measured and compared 
with £ co//-extracted G4 by ELISA We used seed extract of the 'phas'-seed stock 
containing 36.5% G4 (tabel 4). Different amounts of scFv..were incubated with the 
antigen dihydroflavonole-4-reductase in excess and bound antigen was measured by 
sandwich-EUSA (figure 13). ELISA-stgnal curves were set up for both the bacterial 
and plant scFv and compared. The curves overlap each other (figure 14), indicating 
that the plant-produced scFv has similar antigen-binding activity as the bacterial scFv. 



Example 5: Transformation of Phaseolus acutifolius and scFv accumulation In 
transgenic segregating bean seeds 

Phaseolus acutifolius TB1 was transformed with pParc5l-G4bis (figure 16). pParc5l- 
G4bis contains the same T-DNA as pParc5I-G4, except that it contains an additional 

1 5 P35S-GUS-construct for segregation analysis of transgenic plants. 

For the construction of this T-DNA vector, we first made the pilot construct patag6(3'- 
arc5l) (figure 15) according to the cloning step procedure for patag5(3'-arc5l) (figure3). 
An oligonucleotide was made to insert a few unique restriction sites in the T-DNA- 
vector patag3 (Goossens et a/., 1999). The oligonucleotide contained the following 

20 restriction sites from its 5'-end to its 3'-end: the 5'- sticky 1 end of EcoRI, the restriction 
sites Xba\ t Xho\ t and Bg/ll and a S'-'sticky* end that complements Xba\ but destroys it 
after ligation. The oligonucleotide was ligated in patag3, after cutting with 5coRI en 
Xbal. This resulted in the vector patag6. The sequence of the inserted oligonucleotide 
was checked and a clone with correct insert was selected for further cloning steps. 

25 From the vector pBluescript (arc5f) (Goossens et a/., 1995), which contains the 
genomic sequence of the arc5/-gene, we cut the 3'-expression signals of arcSI (3'- 
arc5l) by using Xbal en £coRI. The 3'-arc5l-fragment was ligated in patag6, after 

15 
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cutting with Xba\ en EcoRl This resulted in the vector patag6 (3-arc5l). This pilot 
construct was used to make pParc5l-G4bis (figure 16). The arc5l-\eadef and the G4~ 
coding sequence was cut from the vector pBluescript (Parc5l-arc5rieader*-2S2-G4) 
(figure 6) by the restriction sites BgM en Xoat and ligated in the vector patag6 (3'- 
5 arc5l), after digestion with Bgl\\ en Xbal This resulted in the T-DNA vector pParcSI- 
G4bis. 

Three transgenic plants were obtained by using the protocol of Dillen eta!. (1997). As 
the three plants were regenerated from the same callus, it was expected that they 
were clones from the same transformation event. Seeds were collected and protein 
10 extracts were made from separate seeds in the same buffer as the Arabidopsis seed 
extraction, and according to Goossens et a/. (1999). Total soluble protein 
concentration was measured spectrophotometrically at 280nm and G4 accumulation 
was determined by quantitative Western blot 

G4 was detected as a single protein band (figure 17) in, on average, 3 of 4 seeds for 
15 all three segregating seed stocks. Therefore, these transformants most probably 
contain a single T-DNA-locus. Ail analysed G4-accumu!ating seeds contained 2-2,5% 
G4 of total soluble protein or 2-2,5 milligram scFv per gram fresh weight seed. Sofar, 
only one paper reported scFv production in leguminous species. Perrin et a/. (2000) 
obtained 9 microgram scFv per gram fresh weight in pea seeds with the legA 
20 promoter. The accumulation with the arc5l promoter construct is thus 4 to 5 hundred 
times higher than the reported levels with the legA promoter. As we only obtained one 
transgenic plant line, most probably plant lines with even higher scFv levels can be 
obtained. Goossens et al. (1999) already obtained 4 times higher levels of ARC51 
protein in homozygous seed stocks compared to segregating seed stocks and this 
25 under control of the same arc5/ promoter. As such, after obtaining more transgenic 
lines and selection of homozygous lines, we expect to reach 10% scFv levels in 
Phaseolus acutifolius, by using this arcSI promoter construct. 
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Claims 

1. A seed preferred expression' cassette having gene regulatory elements 
comprising: 

- the arcelin promoter comprising the sequence shown in SEQ ID N° 1 , 
5 - the arcelin 51 leader shown in SEQ ID N° 2, and 

- the arcelin 51 3'end comprising the sequence shown in SEQ ID NT 3. 

2. A seed preferred expression cassette according to claim 1 wherein said arcelin 
promotor is replaced by a phaseolin promoter comprising the sequence shown 
in SEQ ID N° 5. 

10 3. A seed preferred expression cassette according to claim 1 wherein a TMV 
omega leader replaces said arcelin 51 leader. 
4. A seed preferred expression cassette according to any of claims 1-3 further 
comprising the sequence shown in SEQ ID N° 4 encoding the 2S2 storage 
albumin signal peptide. 

15 5. A seed preferred expression cassette according to any of claims 1-4 whereby a 
gene of interest is placed between said leader sequence and said arcelin 51 3' 
sequence. 

6. A seed preferred expression cassette according to claim 5, whereby said gene 
of interest is fused the sequence encoding the 2S2 storage albumin signal 

20 peptide. 

7. A seed preferred expression cassette according to claim 5 or 6, whereby the 
gene of interest encodes a single chain Fv fragment. 

8. A method to obtain seed preferred expression of a heterologous protein at a 
level of at least 10% of the total soluble seed protein, with the proviso that said 

.25 heterologous protein is not an unmodified seed storage protein. 

9. A method according to claim 8 wherein said level is at least 15% of the total 
soluble seed protein. 

10. A method according to any of claims 8-9, using an expression cassette 
according to any of claims 1-7. 

30 11. A plant cell transformed with an expression cassette according to any of the 
claims 1-7. 

12. A transgenic plant comprising an expression cassette according to any of the 
claims 1-7. 
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Figure 1 (cont) 
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Figure 3 
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Figure 4 
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Figure 5 
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Page 2 of Form BCCM™/LMBP/BP/4/00-16 Receipt In the case of an original dajposit 



II. Scientific description and/or proposed taxonomlc designation 

The microorganism Identified under I above was accompanied by: 

(mark with a cross the applicable box(es)) 

- a scientific description yea no □ 

- a proposed taxonomlc designation yes Q no |3 

ill. Receipt and acceptance 

This International Depositary Authority accepts the microorganism identified under I 
above, which was received by it on (date of original deposit) ;.June 19, 2000 



IV. International Depositary Authority 



Belgian Coordinated Collections of Microorganisms (BCCM™| 

Laboratorium voor MoleculaJre Blologle - Plasmldencollectie (LMBP) 

Unlversiteit Gent 

K.L. Ledeganckstraat 36 

B-9000 Gent, Belgium 



Slgnature(s) of person(s) having the power to represent the International Depositary 
Authority or of authorized official (s): 
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Page 1 of Form BCCM™/LMBP/BP/9/00-16 Viability statement 

Budapest Treaty on the International Recognition of the Deposit of MJwoorgfciffigfor 
the Purposes of Patent Procedure 

Viability statement issued pursuant to Rule 10.2 by the International Depositary 
Authority BCCM™/LMBP Identified on the following page 

International Form BCCNPVLMBP/BP/9/00-16 

To : Party to whom the viability statement Is issued: 
Name : Philippe Jacobs 

Address : Rijvlsschestraat 1 20 

B-9052 Zwijnaarde 

I. Depositor: 

1.1 Name : Depicker Ann 

1.2 Address : K.L. Ledeganckstraat 35 

9000 Gent 

H. Identification of the microorganism: 

11.1 Accession number given by the International Depositary Authority: 

LWIBP4128 

11. 2 Date of the original deposit {or where a new deposit or a transfer has been 
made, the most recent relevant date) : June 19, 2000 

III. Viability statement. 

The viability of the microorganism identified under II above was tested on 

: June 22, 2000 

(Give date. In the cases referred to In Rule 10.2(a)(ii) and (iii), refer to the most recent 
viability test). 

On that date, the said microorganism was: {mark the applicable box with a cross) 

S viable 

d no longer viable 



PCT/EP01/06298 
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Conditions under which the viability test has been performed: 

(Fill in if the information has been requested and if the results of the test were 
negative). 



V. International Depositary Authority 



Belgian Coordinated Collections of Microorganisms (BCCM™) 

Laboratorium voor Moleculaire Blologie - Plasmidencollectle (LMBP) 

Universiteit Gent 

K.L. Ledeganckstraat 36 

B-9000 Gent, Belgium 



Signature (9) of person(s) having the power to represent the International Depositary 
Authority or of authorized offlclal(s): 



PCT/EF01/06298 
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Budapest Treaty on the International Recognition of the Deposit of Microorganisms for 
the Purposes of Patent Procedure 

Receipt in the case of an original deposit issued pursuant to Rule 7.1 by the 
International Depositary Authority BCCM™/LMBP Identified at the bottom of next page 
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To : Name of the depositor : Deplcker Ann 



Address : K.L. Ledeganckstraat 35 

9000 Gent 



I. Identification of the microorganism: 

1.1 Identification reference given by the depositor: 

MC1061(pParc5l-G4) 



1. 2 Accession number given by the International Depositary Authority: 



LMBP4128 



